155 research outputs found

    Biomedical data retrieval utilizing textual data in a gene expression database by Richard Lu, MD.

    Get PDF
    Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 68-74).Background: The commoditization of high-throughput gene expression sequencing and microarrays has led to a proliferation in both the amount of genomic and clinical data that is available. Descriptive textual information deposited with gene expression data in the Gene Expression Omnibus (GEO) is an underutilized resource because the textual information is unstructured and difficult to query. Rendering this information in a structured format utilizing standard medical terms would facilitate better searching and data reuse. Such a procedure would significantly increase the clinical utility of biomedical data repositories. Methods: The thesis is divided into two sections. The first section compares how well four medical terminologies were able to represent textual information deposited in GEO. The second section implements free-text search and faceted search and evaluates how well they are able to answer clinical queries with varying levels of complexity. Part I: 120 samples were randomly extracted from samples deposited in the GEO database from six clinical domains-breast cancer, colon cancer, rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), type I diabetes mellitus (IDDM), and asthma. These samples were previously annotated manually and structured textual information was obtained in a tag:value format. Data was mapped to four different controlled terminologies: NCI Thesaurus, MeSH, SNOMED-CT, and ICD- 10. The samples were assigned a score on a three-point scale that was based on how well the terminology was able to represent descriptive textual information. Part II: Faceted and free-text search tools were implemented, with 300 GEO samples included for querying. Eight natural language search questions were selected randomly from scientific journals. Academic researchers were recruited and asked to use the faceted and free-text search tools to locate samples matching the question criteria. Precision, recall, F-score, and search time were compared and analyzed for both free-text and faceted search. Results: The results show that the NCI Thesaurus consistently ranked as the most comprehensive terminology across all domains while ICD-10 consistently ranked as the least comprehensive. Using NCI Thesaurus to augment the faceted search tool, each researcher was able to reach 100% precision and recall (F-score 1.0) for each of the eight search questions. Using free-text search, test users averaged 22.8% precision, 60.7% recall, and an F-score of 0.282. The mean search time per question using faceted search and free-text search were 116.7 seconds, and 138.4 seconds, respectively. The difference between search time was not statistically significant (p=0. 734). However, paired t-test analysis showed a statistically signficant difference between the two search strategies with respect to precision (p=O.001), recall (p=O.042), and F-score (p<0. 001). Conclusion: This work demonstrates that biomedical terms included in a gene expression database can be adequately expressed using the NCI Thesaurus. It also shows that faceted searching using a controlled terminology is superior to conventional free-text searching when answering queries of varying levels of complexity.S.M

    Role of genetic testing for inherited prostate cancer risk: Philadelphia prostate cancer consensus conference 2017

    Get PDF
    Purpose: Guidelines are limited for genetic testing for prostate cancer (PCA). The goal of this conference was to develop an expert consensus-dri

    Risk profiles and one-year outcomes of patients with newly diagnosed atrial fibrillation in India: Insights from the GARFIELD-AF Registry.

    Get PDF
    BACKGROUND: The Global Anticoagulant Registry in the FIELD-Atrial Fibrillation (GARFIELD-AF) is an ongoing prospective noninterventional registry, which is providing important information on the baseline characteristics, treatment patterns, and 1-year outcomes in patients with newly diagnosed non-valvular atrial fibrillation (NVAF). This report describes data from Indian patients recruited in this registry. METHODS AND RESULTS: A total of 52,014 patients with newly diagnosed AF were enrolled globally; of these, 1388 patients were recruited from 26 sites within India (2012-2016). In India, the mean age was 65.8 years at diagnosis of NVAF. Hypertension was the most prevalent risk factor for AF, present in 68.5% of patients from India and in 76.3% of patients globally (P < 0.001). Diabetes and coronary artery disease (CAD) were prevalent in 36.2% and 28.1% of patients as compared with global prevalence of 22.2% and 21.6%, respectively (P < 0.001 for both). Antiplatelet therapy was the most common antithrombotic treatment in India. With increasing stroke risk, however, patients were more likely to receive oral anticoagulant therapy [mainly vitamin K antagonist (VKA)], but average international normalized ratio (INR) was lower among Indian patients [median INR value 1.6 (interquartile range {IQR}: 1.3-2.3) versus 2.3 (IQR 1.8-2.8) (P < 0.001)]. Compared with other countries, patients from India had markedly higher rates of all-cause mortality [7.68 per 100 person-years (95% confidence interval 6.32-9.35) vs 4.34 (4.16-4.53), P < 0.0001], while rates of stroke/systemic embolism and major bleeding were lower after 1 year of follow-up. CONCLUSION: Compared to previously published registries from India, the GARFIELD-AF registry describes clinical profiles and outcomes in Indian patients with AF of a different etiology. The registry data show that compared to the rest of the world, Indian AF patients are younger in age and have more diabetes and CAD. Patients with a higher stroke risk are more likely to receive anticoagulation therapy with VKA but are underdosed compared with the global average in the GARFIELD-AF. CLINICAL TRIAL REGISTRATION-URL: http://www.clinicaltrials.gov. Unique identifier: NCT01090362

    The artificial intelligence-based model ANORAK improves histopathological grading of lung adenocarcinoma

    Get PDF
    The introduction of the International Association for the Study of Lung Cancer grading system has furthered interest in histopathological grading for risk stratification in lung adenocarcinoma. Complex morphology and high intratumoral heterogeneity present challenges to pathologists, prompting the development of artificial intelligence (AI) methods. Here we developed ANORAK (pyrAmid pooliNg crOss stReam Attention networK), encoding multiresolution inputs with an attention mechanism, to delineate growth patterns from hematoxylin and eosin-stained slides. In 1,372 lung adenocarcinomas across four independent cohorts, AI-based grading was prognostic of disease-free survival, and further assisted pathologists by consistently improving prognostication in stage I tumors. Tumors with discrepant patterns between AI and pathologists had notably higher intratumoral heterogeneity. Furthermore, ANORAK facilitates the morphological and spatial assessment of the acinar pattern, capturing acinus variations with pattern transition. Collectively, our AI method enabled the precision quantification and morphology investigation of growth patterns, reflecting intratumoral histological transitions in lung adenocarcinoma

    Evolutionary characterization of lung adenocarcinoma morphology in TRACERx

    Get PDF
    Lung adenocarcinomas (LUADs) display a broad histological spectrum from low-grade lepidic tumors through to mid-grade acinar and papillary and high-grade solid, cribriform and micropapillary tumors. How morphology reflects tumor evolution and disease progression is poorly understood. Whole-exome sequencing data generated from 805 primary tumor regions and 121 paired metastatic samples across 248 LUADs from the TRACERx 421 cohort, together with RNA-sequencing data from 463 primary tumor regions, were integrated with detailed whole-tumor and regional histopathological analysis. Tumors with predominantly high-grade patterns showed increased chromosomal complexity, with higher burden of loss of heterozygosity and subclonal somatic copy number alterations. Individual regions in predominantly high-grade pattern tumors exhibited higher proliferation and lower clonal diversity, potentially reflecting large recent subclonal expansions. Co-occurrence of truncal loss of chromosomes 3p and 3q was enriched in predominantly low-/mid-grade tumors, while purely undifferentiated solid-pattern tumors had a higher frequency of truncal arm or focal 3q gains and SMARCA4 gene alterations compared with mixed-pattern tumors with a solid component, suggesting distinct evolutionary trajectories. Clonal evolution analysis revealed that tumors tend to evolve toward higher-grade patterns. The presence of micropapillary pattern and ‘tumor spread through air spaces’ were associated with intrathoracic recurrence, in contrast to the presence of solid/cribriform patterns, necrosis and preoperative circulating tumor DNA detection, which were associated with extra-thoracic recurrence. These data provide insights into the relationship between LUAD morphology, the underlying evolutionary genomic landscape, and clinical and anatomical relapse risk

    The evolution of lung cancer and impact of subclonal selection in TRACERx

    Get PDF
    Lung cancer is the leading cause of cancer-associated mortality worldwide1. Here we analysed 1,644 tumour regions sampled at surgery or during follow-up from the first 421 patients with non-small cell lung cancer prospectively enrolled into the TRACERx study. This project aims to decipher lung cancer evolution and address the primary study endpoint: determining the relationship between intratumour heterogeneity and clinical outcome. In lung adenocarcinoma, mutations in 22 out of 40 common cancer genes were under significant subclonal selection, including classical tumour initiators such as TP53 and KRAS. We defined evolutionary dependencies between drivers, mutational processes and whole genome doubling (WGD) events. Despite patients having a history of smoking, 8% of lung adenocarcinomas lacked evidence of tobacco-induced mutagenesis. These tumours also had similar detection rates for EGFR mutations and for RET, ROS1, ALK and MET oncogenic isoforms compared with tumours in never-smokers, which suggests that they have a similar aetiology and pathogenesis. Large subclonal expansions were associated with positive subclonal selection. Patients with tumours harbouring recent subclonal expansions, on the terminus of a phylogenetic branch, had significantly shorter disease-free survival. Subclonal WGD was detected in 19% of tumours, and 10% of tumours harboured multiple subclonal WGDs in parallel. Subclonal, but not truncal, WGD was associated with shorter disease-free survival. Copy number heterogeneity was associated with extrathoracic relapse within 1 year after surgery. These data demonstrate the importance of clonal expansion, WGD and copy number instability in determining the timing and patterns of relapse in non-small cell lung cancer and provide a comprehensive clinical cancer evolutionary data resource

    The evolution of non-small cell lung cancer metastases in TRACERx

    Get PDF
    Metastatic disease is responsible for the majority of cancer-related deaths1. We report the longitudinal evolutionary analysis of 126 non-small cell lung cancer (NSCLC) tumours from 421 prospectively recruited patients in TRACERx who developed metastatic disease, compared with a control cohort of 144 non-metastatic tumours. In 25% of cases, metastases diverged early, before the last clonal sweep in the primary tumour, and early divergence was enriched for patients who were smokers at the time of initial diagnosis. Simulations suggested that early metastatic divergence more frequently occurred at smaller tumour diameters (less than 8 mm). Single-region primary tumour sampling resulted in 83% of late divergence cases being misclassified as early, highlighting the importance of extensive primary tumour sampling. Polyclonal dissemination, which was associated with extrathoracic disease recurrence, was found in 32% of cases. Primary lymph node disease contributed to metastatic relapse in less than 20% of cases, representing a hallmark of metastatic potential rather than a route to subsequent recurrences/disease progression. Metastasis-seeding subclones exhibited subclonal expansions within primary tumours, probably reflecting positive selection. Our findings highlight the importance of selection in metastatic clone evolution within untreated primary tumours, the distinction between monoclonal versus polyclonal seeding in dictating site of recurrence, the limitations of current radiological screening approaches for early diverging tumours and the need to develop strategies to target metastasis-seeding subclones before relapse

    Genomic–transcriptomic evolution in lung cancer and metastasis

    Get PDF
    Intratumour heterogeneity (ITH) fuels lung cancer evolution, which leads to immune evasion and resistance to therapy1. Here, using paired whole-exome and RNA sequencing data, we investigate intratumour transcriptomic diversity in 354 non-small cell lung cancer tumours from 347 out of the first 421 patients prospectively recruited into the TRACERx study2,3. Analyses of 947 tumour regions, representing both primary and metastatic disease, alongside 96 tumour-adjacent normal tissue samples implicate the transcriptome as a major source of phenotypic variation. Gene expression levels and ITH relate to patterns of positive and negative selection during tumour evolution. We observe frequent copy number-independent allele-specific expression that is linked to epigenomic dysfunction. Allele-specific expression can also result in genomic–transcriptomic parallel evolution, which converges on cancer gene disruption. We extract signatures of RNA single-base substitutions and link their aetiology to the activity of the RNA-editing enzymes ADAR and APOBEC3A, thereby revealing otherwise undetected ongoing APOBEC activity in tumours. Characterizing the transcriptomes of primary–metastatic tumour pairs, we combine multiple machine-learning approaches that leverage genomic and transcriptomic variables to link metastasis-seeding potential to the evolutionary context of mutations and increased proliferation within primary tumour regions. These results highlight the interplay between the genome and transcriptome in influencing ITH, lung cancer evolution and metastasis

    Factors Associated with Revision Surgery after Internal Fixation of Hip Fractures

    Get PDF
    Background: Femoral neck fractures are associated with high rates of revision surgery after management with internal fixation. Using data from the Fixation using Alternative Implants for the Treatment of Hip fractures (FAITH) trial evaluating methods of internal fixation in patients with femoral neck fractures, we investigated associations between baseline and surgical factors and the need for revision surgery to promote healing, relieve pain, treat infection or improve function over 24 months postsurgery. Additionally, we investigated factors associated with (1) hardware removal and (2) implant exchange from cancellous screws (CS) or sliding hip screw (SHS) to total hip arthroplasty, hemiarthroplasty, or another internal fixation device. Methods: We identified 15 potential factors a priori that may be associated with revision surgery, 7 with hardware removal, and 14 with implant exchange. We used multivariable Cox proportional hazards analyses in our investigation. Results: Factors associated with increased risk of revision surgery included: female sex, [hazard ratio (HR) 1.79, 95% confidence interval (CI) 1.25-2.50; P = 0.001], higher body mass index (fo

    Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA

    Get PDF
    Circulating tumour DNA (ctDNA) can be used to detect and profile residual tumour cells persisting after curative intent therapy1. The study of large patient cohorts incorporating longitudinal plasma sampling and extended follow-up is required to determine the role of ctDNA as a phylogenetic biomarker of relapse in early-stage non-small-cell lung cancer (NSCLC). Here we developed ctDNA methods tracking a median of 200 mutations identified in resected NSCLC tissue across 1,069 plasma samples collected from 197 patients enrolled in the TRACERx study2. A lack of preoperative ctDNA detection distinguished biologically indolent lung adenocarcinoma with good clinical outcome. Postoperative plasma analyses were interpreted within the context of standard-of-care radiological surveillance and administration of cytotoxic adjuvant therapy. Landmark analyses of plasma samples collected within 120 days after surgery revealed ctDNA detection in 25% of patients, including 49% of all patients who experienced clinical relapse; 3 to 6 monthly ctDNA surveillance identified impending disease relapse in an additional 20% of landmark-negative patients. We developed a bioinformatic tool (ECLIPSE) for non-invasive tracking of subclonal architecture at low ctDNA levels. ECLIPSE identified patients with polyclonal metastatic dissemination, which was associated with a poor clinical outcome. By measuring subclone cancer cell fractions in preoperative plasma, we found that subclones seeding future metastases were significantly more expanded compared with non-metastatic subclones. Our findings will support (neo)adjuvant trial advances and provide insights into the process of metastatic dissemination using low-ctDNA-level liquid biopsy
    corecore